back to overview    

Preparation: Please open your RStudio project and download the new data (‘Phylogeny_ConservativeCon_Checklist.zip’, ‘palms_specsxsites_phylo.csv’ and ‘palmtree_pruned.nex’) for today from Stud-IP and unzip and copy them into your .Rproj folder ‘/Data/’. You can use getwd() to locate your current working directory, which should be your project folder. Please install the following R-packages using install.packages():

 

Load R packages & island data set

library(plyr) # basic data manipulation

library(rgdal) #  Geospatial fun
library(rgeos) #  Geospatial fun
library(maptools)# Geospatial fun

library(Taxonstand) # Taxonomic standardisation of plant species names
library(ape) # Analyses of phylogenetics and evolution
library(picante) # Phylogenies and ecology
library(pez) # Phylogenies and ecology

library(psych) # basic stats stuff
library(classInt) # for plotting nice colors
library(colorRamps) # for plotting nice colors

 

1. Load data into R

a. Load a phylogeny & look at structure of ‘multiPhylo’ object

load Multi tree object into R

Data from Faurby, S., Eiserhardt, W.L., Baker, W.J. & Svenning, J.-C. (2016).
An all-evidence species-level supertree for the palms (Arecaceae).
Molecular Phylogenetics and Evolution, 100, 57-69.

palmtrees <- read.nexus("Data/Phylogeny_ConservativeCon_Checklist.nex")

palmtrees
## 1000 phylogenetic trees
palmtrees[[1]]
## 
## Phylogenetic tree with 2539 tips and 2538 internal nodes.
## 
## Tip labels:
##  Nypa_fruticans, Sabal_mauritiiformis, Sabal_pumos, Sabal_domingensis, Sabal_minor, Sabal_palmetto, ...
## 
## Rooted; includes branch lengths.
palmtree <- palmtrees[[1]]
str(palmtree)
## List of 4
##  $ edge       : int [1:5076, 1:2] 2540 2541 2541 2542 2543 2544 2545 2546 2546 2547 ...
##  $ edge.length: num [1:5076] 8.89 95.72 9.44 9.78 12.25 ...
##  $ Nnode      : int 2538
##  $ tip.label  : chr [1:2539] "Nypa_fruticans" "Sabal_mauritiiformis" "Sabal_pumos" "Sabal_domingensis" ...
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"
head(palmtree$tip.label)
## [1] "Nypa_fruticans"       "Sabal_mauritiiformis" "Sabal_pumos"         
## [4] "Sabal_domingensis"    "Sabal_minor"          "Sabal_palmetto"
length(palmtree$tip.label)
## [1] 2539
is.rooted(palmtree) # does the phylogeny have a root? i.e. can the most basal ancestor be identified?
## [1] TRUE

 

Explanation of structure of a phylogeny in R
edge: a two-column matrix where each row represents a branch (or edge) of the tree. The nodes and the tips are symbolized with integers.The n tips are numbered from 1 to n, and the m (internal) nodes from n+1 to n+m (the root being n + 1).For each row, the first column gives the ancestor.

edge.length (optional): A numeric vector giving the lengths of the branches given by edge.

tip.label: A vector of mode character giving the labels of the tips. The order of these labels corresponds to the integers 1 to n in edge.

Nnode: An integer value giving the number of nodes in the tree (m).

node.label (optional): A vector of mode character giving the labels of the nodes (ordered in the same way as the tip.label).

root.edge (optional): A numeric value giving the length of the branch at the root if it exists.

plot(palmtree,type="fan",cex=0.3, edge.color="gray70",tip.color="#ef8a62")

 

b. Loading palm distribution data

Data from Kreft, H., Sommer, J.H. & Barthlott, W. (2006).
The significance of geographic range size for spatial diversity
patterns in Neotropical palms. Ecography, 29, 21-30.

species <- read.csv("Data/palms_species_per_gridcell.csv",sep=",", stringsAsFactors = FALSE)
head(species)
##   grid_id spp_Id    GENUS EPITHET
## 1   13735    550 Ammandra natalia
## 2   13736    550 Ammandra natalia
## 3   13737    550 Ammandra natalia
## 4   13738    550 Ammandra natalia
## 5   13910    550 Ammandra natalia
## 6   13911    550 Ammandra natalia
length(unique(species$grid_id)) # number of grid cells
## [1] 6638

 

c. Exercise I

Tasks:
1. Add a column for the full species name with "_" as separator: “species”.
2. How many palm species occur in the Americas?
3. How many and which species in the palm distribution dataset are missing from the phylogeny?
 

Exercise I solutions:
Add a column for the full species name with a hyphen as separator: “species”

## [1] "Ammandra_natalia"        "Ammandra_decasperma"    
## [3] "Ammandra_dasyneura"      "Phytelephas_tumacana"   
## [5] "Phytelephas_tenuicaulis"

 

How many palm species occur in the Americas?

## [1] 550

 

How many and which species in the palm distribution dataset are missing from the phylogeny?

## [1] 80
## [1] "Ammandra_natalia"    "Ammandra_dasyneura"  "Geonoma_weberbaueri"
## [4] "Geonoma_rubescens"   "Geonoma_polyandra"

 

2. Taxonomic standardization

Write species missing from phylogeny into a vector

specmissing <- unique(species$species[which(!species$species %in% palmtree$tip.label)])

specmissing <- gsub("_"," ",specmissing) # # replace underscore with a space

 

Match palm species names to those in The Plant List

taxstand <- TPL(specmissing, diffchar = 2, max.distance = 1)
head(taxstand)[,11:16]
##   TPL.version Taxonomic.status    Family New.Genus New.Hybrid.marker
## 1         1.1          Synonym Arecaceae  Aphandra                  
## 2         1.1          Synonym Arecaceae  Ammandra                  
## 3         1.1          Synonym Arecaceae   Geonoma                  
## 4         1.1          Synonym Arecaceae   Geonoma                  
## 5         1.1          Synonym Arecaceae   Geonoma                  
## 6         1.1         Accepted Arecaceae   Geonoma                  
##   New.Species
## 1     natalia
## 2  decasperma
## 3      undata
## 4    pohliana
## 5  multisecta
## 6 poeppigiana

 

Replace old names by new names & overwrite the combined species name

for (i in 1:nrow(taxstand)){
  species$GENUS[which(species$GENUS == taxstand$Genus[i] & species$EPITHET == taxstand$Species[i])] <- taxstand$New.Genus[i]
  species$EPITHET[which(species$GENUS == taxstand$Genus[i] & species$EPITHET == taxstand$Species[i])] <- taxstand$New.Species[i]
}

species$species <- paste(species$GENUS,species$EPITHET, sep="_") # overwrite the combined species name

 

Check names that are not in the phylogeny

specmissing <- unique(species$species[which(!species$species %in% palmtree$tip.label)])
specmissing
##  [1] "Geonoma_paraguensis"          "Calyptrogyne_kunaria"        
##  [3] "Desmoncus_phoenicocarpus"     "Desmoncus_cirrhiferus"       
##  [5] "Bactris_horrispatha"          "Bactris_hondurensis"         
##  [7] "Gastrococcus_crispa"          "Polyandrococcus_caudescens"  
##  [9] "Hypospathe_altissima"         "Hypospathe_macrorachis"      
## [11] "Hypospathe_elegans"           "Prestoea_longipetiolata"     
## [13] "Dictyocarpum_ptarianum"       "Dictyocarpum_lamarckianum"   
## [15] "Dictyocarpum_fuscum"          "Chamaedorea_selvae"          
## [17] "Chamaedorea_ernesti-angustii" "Acoelorraphe_wrightii"       
## [19] "Coplothrinax_wrightii"        "Coplothrinax_cookii"         
## [21] "Crysophila_williamsii"        "Crysophila_warscewiczii"     
## [23] "Crysophila_stauracantha"      "Crysophila_nana"             
## [25] "Crysophila_macrocarpa"        "Crysophila_kalbreyeri"       
## [27] "Crysophila_guagara"           "Crysophila_grayumii"         
## [29] "Crysophila_cookii"
length(specmissing)
## [1] 29
species <- species[which(!species$species %in% specmissing),]

 

Missing species would have to be looked up and added manually to the phylogeny by an expert.
If missing species cannot be added reliably, one can remove them from the dataset for analysis.

3. Loading grid shapefile and changing resolution

Read in polygon shapefile

americas <- readOGR("Data/americas.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/home/dylan/Dropbox (iDiv)/Teaching/MCMMB/Data/americas.shp", layer: "americas"
## with 1 features
## It has 15 fields
## Integer64 fields read as strings:  POP_CNTRY
grid <- readOGR("Data/30min_grid_select50%.shp", integer64="allow.loss")
## OGR data source with driver: ESRI Shapefile 
## Source: "/home/dylan/Dropbox (iDiv)/Teaching/MCMMB/Data/30min_grid_select50%.shp", layer: "30min_grid_select50%"
## with 6038 features
## It has 3 fields
## Integer64 fields read as signed 32-bit integers:  ID
names(grid@data)[1] <- "grid_id"

 

Convert the grid shapefile to a coarser resolution to have fewer grid cells

grid_centroids <- gCentroid(grid, byid=TRUE) # make spatial points dataframe from grid
grid_centroids <- SpatialPointsDataFrame(grid_centroids, data = data.frame(grid_centroids@coords))
names(grid_centroids@data) <- c("longitude","latitude")
par(mfrow=c(1,1))
plot(grid_centroids)
plot(americas, add=TRUE)

 

Add coordinates to SpatialPolygonsDataframe

grid@data <- cbind(grid@data,grid_centroids@data)
head(grid@data)
##   grid_id AREA PORTION longitude latitude
## 0     538 0.25     100 -114.2507    36.25
## 1     539 0.25     100 -113.7507    36.25
## 2     540 0.25     100 -113.2507    36.25
## 3     712 0.25     100 -114.7507    35.75
## 4     713 0.25     100 -114.2507    35.75
## 5     714 0.25     100 -113.7507    35.75

 

Make new IDs for neighbouring cells

range(grid@data$longitude)
## [1] -115.75066  -35.25066
range(grid@data$latitude)
## [1] -34.75  36.25
longitude <- seq(-116,-34, by=2)
latitude  <- seq(-35,35, by=2)

new_ID_matrix <- matrix(c(1:(length(longitude)*length(latitude))), nrow=length(longitude), ncol=length(latitude) , dimnames = list(longitude, latitude), byrow=TRUE)

grid@data$new_ID <- NA

for(i in 1:nrow(grid@data)){
  grid@data$new_ID[i] <-   new_ID_matrix[which(longitude < grid@data$longitude[i] & (longitude + 2) > grid@data$longitude[i]), which(latitude < grid@data$latitude[i] & (latitude + 2) > grid@data$latitude[i])]
}

head(grid@data)
##   grid_id AREA PORTION longitude latitude new_ID
## 0     538 0.25     100 -114.2507    36.25     36
## 1     539 0.25     100 -113.7507    36.25     72
## 2     540 0.25     100 -113.2507    36.25     72
## 3     712 0.25     100 -114.7507    35.75     36
## 4     713 0.25     100 -114.2507    35.75     36
## 5     714 0.25     100 -113.7507    35.75     72

 

Join neighbouring cells based on new IDs & aggregate species distribution data using the new grid

grid_2degrees <- unionSpatialPolygons(grid, grid@data$new_ID)
uniqueIDs <- unique(grid@data$new_ID)
grid_2degrees <- SpatialPolygonsDataFrame(grid_2degrees, data=data.frame(new_ID=uniqueIDs, row.names = uniqueIDs))
head(grid_2degrees@data)
##      new_ID
## 1009   1009
## 1010   1010
## 1011   1011
## 1012   1012
## 1013   1013
## 1014   1014
plot(grid_2degrees)
plot(americas, add=TRUE)

species <- join(species, grid@data, by="grid_id", type="inner")
head(species)
##   grid_id spp_Id    GENUS EPITHET          species AREA PORTION longitude
## 1   13735    550 Aphandra natalia Aphandra_natalia 0.25     100 -78.25066
## 2   13736    550 Aphandra natalia Aphandra_natalia 0.25     100 -77.75066
## 3   13737    550 Aphandra natalia Aphandra_natalia 0.25     100 -77.25066
## 4   13738    550 Aphandra natalia Aphandra_natalia 0.25     100 -76.75066
## 5   13910    550 Aphandra natalia Aphandra_natalia 0.25     100 -78.25066
## 6   13911    550 Aphandra natalia Aphandra_natalia 0.25     100 -77.75066
##   latitude new_ID
## 1    -1.25    665
## 2    -1.25    701
## 3    -1.25    701
## 4    -1.25    701
## 5    -1.75    665
## 6    -1.75    701

 

4. Calculating phylogenetic community metrics

Prune phylogeny to exclude species not in palm distribution data
Make sure that the same number of species in your data set are in your phylogeny

palmtree_pruned <- drop.tip(palmtree,palmtree$tip.label[which(!palmtree$tip.label %in% unique(species$species))])
length(palmtree_pruned$tip.label) # 
## [1] 498
length(unique(species$species))
## [1] 498
write.nexus(palmtree_pruned, file="Data/palmtree_pruned.nex")

 

Convert long table to species by grid-cell table (a community matrix)

species <- unique(species[,c("new_ID","species")])
species_grid <- as.data.frame.matrix(table(species))
species_grid<-as.matrix(species_grid)

write.csv(species_grid, file = "Data/palms_specsxsites_phylo.csv")

 

Calculate phylogenetic diversity

palmtree_pruned<-read.nexus("Data/palmtree_pruned.nex") # if you have a hard time reading in the large palm phylogeny
species_grid <- as.matrix(read.csv("Data/palms_specsxsites_phylo.csv", row.names = 1)) # in case you got stuck above

pd_palms <- pd(species_grid, palmtree_pruned)
head(pd_palms)
##          PD SR
## 32 140.0006  2
## 33 140.0006  2
## 34 104.6140  1
## 35 104.6140  1
## 36 104.6140  1
## 67 140.0006  2

 

pd() calculates Faith’s PD and a corrected version of it being the residuals of a regression with total abundance or species richness per plot. Lets look into this:

pd_model <- lm(PD ~ SR, data = pd_palms)
pd_palms$residuals <- residuals(pd_model)
par(mfrow=c(1,2))
plot(pd_palms$SR, pd_palms$PD, main="Species richness vs Faith's PD", xlab="Species number",  ylab="Faith's PD", cex.main=0.8,cex.axis=0.8)
abline(pd_model)
plot(pd_palms$SR, pd_palms$residuals, main="Species richness vs Faith's PD residuals", xlab="Species number",  ylab="Faith's PD residuals", cex.main=0.8,cex.axis=0.8)

Calculate generic phylogenetic metrics (absolute and standardized)

Pphylogenetic diversity metrics are standardized by first using a null model to generate expected values.
Then, the observed value is compared to the null, usually using standardized effect sizes:
\(SES = (observed - mean.null)/sd.null\), where mean.null is the mean of the null values and sd.null is the standard deviation of the null values.

c.data <- comparative.comm(palmtree_pruned , species_grid)
pd.null<-generic.null(c.data, c(.pd,.mpd,.mntd),null.model = "richness",comp.fun=.ses,permute=100) 
colnames(pd.null) <- c("Faith_PD","Corrected_FaithPD","MPD", "MNTD")
rownames(pd.null) <- sites(c.data)
pd.null<-data.frame(pd.null)
pd.null$new_ID<-as.numeric(rownames(pd.null))

pd.out    <- pd.null[,c(21, 1,13,3,15,4,16)]  # select columns for final data set
pd.out [is.na(pd.out )] <- 0  # change NAs to zeroes; no phylogenetic diversity if grid has only 1 spp!

par(mfrow=c(1,2))
plot(pd_palms$SR, pd_palms$residuals, main="Species richness vs Faith's PD residuals", xlab="Species number",  ylab="Faith's PD residuals", cex.main=0.8,cex.axis=0.8)
plot(pd.out$Faith_PD.observed, pd.out$Faith_PD.SES,main="Faith's PD vs standardised Faith's PD",cex.main=0.8,xlab="observed Faith's PD",ylab="standardised Faith's PD",cex.axis=0.8)

 

a. Exercise II

Tasks:
1. Create a data.frame called ‘palmdiversity’ with species richness, PD and residual PD from ‘pd_palms’ and standardized pd metrics from ‘pd.out’.
2. Plot correlations among the diversity metrics.
3. Join the ‘palmdiversity’ data.frame to the grid_2degrees shapefile to be able to link the diversity metrics and the poylgons for plotting. 4. Plot maps with colours according to the diversity metrics.

Exercise II solutions:

Create a data.frame called ‘palmdiversity’ by joining ‘pd_palms’ with ‘pd.out’

##   new_ID       PD SR  residuals Faith_PD.observed Faith_PD.SES
## 1     32 140.0006  2  -96.80594          140.0006    -15.91277
## 2     33 140.0006  2  -96.80594          140.0006    -13.84592
## 3     34 104.6140  1 -112.12659          104.6140    -15.55744
## 4     35 104.6140  1 -112.12659          104.6140    -15.48190
## 5     36 104.6140  1 -112.12659          104.6140    -12.01928
## 6     67 140.0006  2  -96.80594          140.0006    -16.27082
##   MPD.observed   MPD.SES MNTD.observed  MNTD.SES
## 1      70.7732 -15.91277       70.7732 -15.91277
## 2      70.7732 -13.84592       70.7732 -13.84592
## 3       0.0000   0.00000        0.0000   0.00000
## 4       0.0000   0.00000        0.0000   0.00000
## 5       0.0000   0.00000        0.0000   0.00000
## 6      70.7732 -16.27082       70.7732 -16.27082

 

Plot correlations among the diversity metrics
 

Join the palmdiversity data.frame to the shapefile with the new gridcell IDs

##   new_ID       PD SR   residuals Faith_PD.observed Faith_PD.SES
## 1   1009 104.6140  1 -112.126586          104.6140   -16.320707
## 2   1010 190.8985  2  -45.908126          190.8985     8.548754
## 3   1011 212.9137  3  -43.958851          212.9137    -6.371583
## 4   1012 160.0228  2  -76.783808          160.0228    -7.815616
## 5   1013 252.0204  3   -4.852128          252.0204     6.280482
## 6   1014 329.4444  5   32.439925          329.4444    -2.904875
##   MPD.observed   MPD.SES MNTD.observed  MNTD.SES
## 1       0.0000  0.000000       0.00000  0.000000
## 2     172.5688  8.548754     172.56882  8.548754
## 3     129.7227 -2.677800      86.87658 -9.078391
## 4     110.8175 -7.815616     110.81746 -7.815616
## 5     155.7938  6.673172     139.01887  5.794896
## 6     142.3322  2.984331      88.67323 -5.864386

 

Plot maps with different colours for each diversity metric

Species richness, PD, MPD & MNTD (absolute)
 

PD, MPD & MNTD (standardized)